Overview

Dataset statistics

Number of variables19
Number of observations319749
Missing cells9339
Missing cells (%)0.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory42.7 MiB
Average record size in memory140.0 B

Variable types

BOOL11
NUM8

Warnings

Diabetic has 9339 (2.9%) missing values Missing
df_index has unique values Unique
PhysicalHealth has 226557 (70.9%) zeros Zeros
MentalHealth has 205367 (64.2%) zeros Zeros
Race has 5201 (1.6%) zeros Zeros
GenHealth has 66832 (20.9%) zeros Zeros

Reproduction

Analysis started2022-05-06 14:04:39.247276
Analysis finished2022-05-06 14:05:15.469230
Duration36.22 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct319749
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean159897.1634
Minimum0
Maximum319794
Zeros1
Zeros (%)< 0.1%
Memory size2.4 MiB
2022-05-06T16:05:15.625235image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile15990.4
Q179949
median159896
Q3239844
95-th percentile303805.6
Maximum319794
Range319794
Interquartile range (IQR)159895

Descriptive statistics

Standard deviation92316.78096
Coefficient of variation (CV)0.5773509609
Kurtosis-1.199977999
Mean159897.1634
Median Absolute Deviation (MAD)79948
Skewness1.558113976e-05
Sum5.112695811e+10
Variance8522388046
MonotocityStrictly increasing
2022-05-06T16:05:15.750230image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01< 0.1%
 
2131951< 0.1%
 
2132021< 0.1%
 
2132011< 0.1%
 
2132001< 0.1%
 
2131991< 0.1%
 
2131981< 0.1%
 
2131971< 0.1%
 
2131961< 0.1%
 
2131941< 0.1%
 
Other values (319739)319739> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
3197941< 0.1%
 
3197931< 0.1%
 
3197921< 0.1%
 
3197911< 0.1%
 
3197901< 0.1%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
0
292383 
1
 
27366
ValueCountFrequency (%) 
029238391.4%
 
1273668.6%
 
2022-05-06T16:05:15.834233image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

BMI
Real number (ℝ≥0)

Distinct3579
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.32766192
Minimum13.02
Maximum94.85
Zeros0
Zeros (%)0.0%
Memory size2.4 MiB
2022-05-06T16:05:15.914239image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum13.02
5-th percentile20.14
Q124.03
median27.34
Q331.44
95-th percentile40.18
Maximum94.85
Range81.83
Interquartile range (IQR)7.41

Descriptive statistics

Standard deviation6.353754415
Coefficient of variation (CV)0.2242950524
Kurtosis3.893894458
Mean28.32766192
Median Absolute Deviation (MAD)3.66
Skewness1.33521651
Sum9057741.57
Variance40.37019517
MonotocityNot monotonic
2022-05-06T16:05:16.034229image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
26.6337621.2%
 
27.4627670.9%
 
27.4427230.9%
 
24.4126960.8%
 
27.1225250.8%
 
25.122620.7%
 
28.719680.6%
 
29.5318940.6%
 
32.2818780.6%
 
29.2918690.6%
 
Other values (3569)29540592.4%
 
ValueCountFrequency (%) 
13.022< 0.1%
 
13.042< 0.1%
 
13.081< 0.1%
 
13.121< 0.1%
 
13.171< 0.1%
 
ValueCountFrequency (%) 
94.851< 0.1%
 
94.661< 0.1%
 
93.971< 0.1%
 
93.861< 0.1%
 
92.531< 0.1%
 

Smoking
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
0
187861 
1
131888 
ValueCountFrequency (%) 
018786158.8%
 
113188841.2%
 
2022-05-06T16:05:16.594240image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
0
297978 
1
 
21771
ValueCountFrequency (%) 
029797893.2%
 
1217716.8%
 
2022-05-06T16:05:16.633237image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Stroke
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
0
307684 
1
 
12065
ValueCountFrequency (%) 
030768496.2%
 
1120653.8%
 
2022-05-06T16:05:16.670258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

PhysicalHealth
Real number (ℝ≥0)

ZEROS

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.371294359
Minimum0
Maximum30
Zeros226557
Zeros (%)70.9%
Memory size2.4 MiB
2022-05-06T16:05:16.735311image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32
95-th percentile30
Maximum30
Range30
Interquartile range (IQR)2

Descriptive statistics

Standard deviation7.950176157
Coefficient of variation (CV)2.358196974
Kurtosis5.530001927
Mean3.371294359
Median Absolute Deviation (MAD)0
Skewness2.604224887
Sum1077968
Variance63.20530093
MonotocityNot monotonic
2022-05-06T16:05:16.829252image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%) 
022655770.9%
 
30195026.1%
 
2148804.7%
 
1104883.3%
 
386162.7%
 
576062.4%
 
1054521.7%
 
1550121.6%
 
746291.4%
 
444671.4%
 
Other values (21)125403.9%
 
ValueCountFrequency (%) 
022655770.9%
 
1104883.3%
 
2148804.7%
 
386162.7%
 
444671.4%
 
ValueCountFrequency (%) 
30195026.1%
 
292040.1%
 
284450.1%
 
27124< 0.1%
 
2666< 0.1%
 

MentalHealth
Real number (ℝ≥0)

ZEROS

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.898182637
Minimum0
Maximum30
Zeros205367
Zeros (%)64.2%
Memory size2.4 MiB
2022-05-06T16:05:16.928240image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q33
95-th percentile30
Maximum30
Range30
Interquartile range (IQR)3

Descriptive statistics

Standard deviation7.954713452
Coefficient of variation (CV)2.040621026
Kurtosis4.404563305
Mean3.898182637
Median Absolute Deviation (MAD)0
Skewness2.331188413
Sum1246440
Variance63.27746611
MonotocityNot monotonic
2022-05-06T16:05:17.033253image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%) 
020536764.2%
 
30173665.4%
 
2164945.2%
 
5141494.4%
 
10105133.3%
 
3104663.3%
 
1598963.1%
 
192892.9%
 
755281.7%
 
2054301.7%
 
Other values (21)152514.8%
 
ValueCountFrequency (%) 
020536764.2%
 
192892.9%
 
2164945.2%
 
3104663.3%
 
453781.7%
 
ValueCountFrequency (%) 
30173665.4%
 
293170.1%
 
285150.2%
 
27126< 0.1%
 
2659< 0.1%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
0
275351 
1
44398 
ValueCountFrequency (%) 
027535186.1%
 
14439813.9%
 
2022-05-06T16:05:17.105258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Sex
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
0
167782 
1
151967 
ValueCountFrequency (%) 
016778252.5%
 
115196747.5%
 
2022-05-06T16:05:17.138239image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

AgeCategory
Real number (ℝ≥0)

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.35554451
Minimum21
Maximum80
Zeros0
Zeros (%)0.0%
Memory size2.4 MiB
2022-05-06T16:05:17.196255image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q142
median57
Q367
95-th percentile80
Maximum80
Range59
Interquartile range (IQR)25

Descriptive statistics

Standard deviation17.72029002
Coefficient of variation (CV)0.3260070372
Kurtosis-1.009483352
Mean54.35554451
Median Absolute Deviation (MAD)15
Skewness-0.3272731296
Sum17380131
Variance314.0086785
MonotocityNot monotonic
2022-05-06T16:05:17.272258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%) 
673414810.7%
 
623368210.5%
 
72310629.7%
 
57297529.3%
 
52253767.9%
 
80241487.6%
 
47217906.8%
 
77214766.7%
 
21210616.6%
 
42210056.6%
 
Other values (3)5624917.6%
 
ValueCountFrequency (%) 
21210616.6%
 
27169535.3%
 
32187495.9%
 
37205476.4%
 
42210056.6%
 
ValueCountFrequency (%) 
80241487.6%
 
77214766.7%
 
72310629.7%
 
673414810.7%
 
623368210.5%
 

Race
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.396804994
Minimum0
Maximum5
Zeros5201
Zeros (%)1.6%
Memory size1.2 MiB
2022-05-06T16:05:17.365232image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q15
median5
Q35
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.212139683
Coefficient of variation (CV)0.2756864778
Kurtosis2.666402192
Mean4.396804994
Median Absolute Deviation (MAD)0
Skewness-1.923831146
Sum1405874
Variance1.46928261
MonotocityNot monotonic
2022-05-06T16:05:17.449238image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%) 
524518276.7%
 
3274448.6%
 
2229327.2%
 
4109263.4%
 
180642.5%
 
052011.6%
 
ValueCountFrequency (%) 
052011.6%
 
180642.5%
 
2229327.2%
 
3274448.6%
 
4109263.4%
 
ValueCountFrequency (%) 
524518276.7%
 
4109263.4%
 
3274448.6%
 
2229327.2%
 
180642.5%
 

Diabetic
Boolean

MISSING

Distinct2
Distinct (%)< 0.1%
Missing9339
Missing (%)2.9%
Memory size2.4 MiB
0
269612 
1
40798 
(Missing)
 
9339
ValueCountFrequency (%) 
026961284.3%
 
14079812.8%
 
(Missing)93392.9%
 
2022-05-06T16:05:17.512241image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
1
247926 
0
71823 
ValueCountFrequency (%) 
124792677.5%
 
07182322.5%
 
2022-05-06T16:05:17.548239image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

GenHealth
Real number (ℝ≥0)

ZEROS

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.220932669
Minimum0
Maximum4
Zeros66832
Zeros (%)20.9%
Memory size1.2 MiB
2022-05-06T16:05:17.599249image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q34
95-th percentile4
Maximum4
Range4
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.534667273
Coefficient of variation (CV)0.6910012603
Kurtosis-1.394283379
Mean2.220932669
Median Absolute Deviation (MAD)2
Skewness-0.1296501094
Sum710141
Variance2.35520364
MonotocityNot monotonic
2022-05-06T16:05:17.674239image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
411384935.6%
 
29311329.1%
 
06683220.9%
 
13467310.8%
 
3112823.5%
 
ValueCountFrequency (%) 
06683220.9%
 
13467310.8%
 
29311329.1%
 
3112823.5%
 
411384935.6%
 
ValueCountFrequency (%) 
411384935.6%
 
3112823.5%
 
29311329.1%
 
13467310.8%
 
06683220.9%
 

SleepTime
Real number (ℝ≥0)

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.097026105
Minimum1
Maximum24
Zeros0
Zeros (%)0.0%
Memory size2.4 MiB
2022-05-06T16:05:17.763251image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q16
median7
Q38
95-th percentile9
Maximum24
Range23
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.435732954
Coefficient of variation (CV)0.2023006444
Kurtosis7.842039137
Mean7.097026105
Median Absolute Deviation (MAD)1
Skewness0.6774706965
Sum2269267
Variance2.061329116
MonotocityNot monotonic
2022-05-06T16:05:17.855258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%) 
79774130.6%
 
89759430.5%
 
66670920.9%
 
5191816.0%
 
9160365.0%
 
1077932.4%
 
477502.4%
 
1222040.7%
 
319920.6%
 
27870.2%
 
Other values (14)19620.6%
 
ValueCountFrequency (%) 
15500.2%
 
27870.2%
 
319920.6%
 
477502.4%
 
5191816.0%
 
ValueCountFrequency (%) 
2430< 0.1%
 
233< 0.1%
 
229< 0.1%
 
212< 0.1%
 
2063< 0.1%
 

Asthma
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
0
276884 
1
42865 
ValueCountFrequency (%) 
027688486.6%
 
14286513.4%
 
2022-05-06T16:05:17.919258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
0
307975 
1
 
11774
ValueCountFrequency (%) 
030797596.3%
 
1117743.7%
 
2022-05-06T16:05:17.954244image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

SkinCancer
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
0
289934 
1
29815 
ValueCountFrequency (%) 
028993490.7%
 
1298159.3%
 
2022-05-06T16:05:17.988255image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Interactions

2022-05-06T16:05:00.852724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:01.086723image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:01.272724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:01.462724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:01.661724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:01.851722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:02.034724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:02.252724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:02.448723image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:02.649724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:02.819722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:03.001723image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:03.194724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:03.374722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:03.553722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:03.750721image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:03.937722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:04.144722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:04.326724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:04.522723image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:04.724750image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:04.912742image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:05.097722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:05.305722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:05.502722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:05.706724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:05.888725image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:06.079723image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:06.278750image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:06.464750image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:06.650722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:06.859722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:07.058742image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:07.266725image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:07.446722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:07.635722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:07.837725image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:08.033724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:08.218731image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:08.438730image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:08.644722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:08.847724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:09.030741image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:09.222743image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:09.425250image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:09.613249image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:09.789258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:09.991258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:10.187231image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:10.403229image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:10.591258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:10.787251image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:10.992258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:11.190258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:11.378258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:11.589258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:11.791258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:12.009250image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:12.200240image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:12.399240image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:12.595250image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:12.788255image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:12.968240image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:13.184229image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-05-06T16:05:18.073232image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-06T16:05:18.284258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-06T16:05:18.480258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-06T16:05:18.693258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-06T16:05:13.604234image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:14.205232image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-06T16:05:15.145232image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Sample

First rows

df_indexHeartDiseaseBMISmokingAlcoholDrinkingStrokePhysicalHealthMentalHealthDiffWalkingSexAgeCategoryRaceDiabeticPhysicalActivityGenHealthSleepTimeAsthmaKidneyDiseaseSkinCancer
00016.601003.030.0005751.0145.0101
11020.340010.00.0008050.0147.0000
22026.5810020.030.0016751.0118.0100
33024.210000.00.0007750.0026.0001
44023.7100028.00.0104250.0148.0000
55128.871006.00.0107720.00112.0000
66021.6300015.00.0007250.0114.0101
77031.641005.00.0108051.0029.0100
88026.450000.00.000805NaN015.0010
99040.690000.00.0116750.01210.0000

Last rows

df_indexHeartDiseaseBMISmokingAlcoholDrinkingStrokePhysicalHealthMentalHealthDiffWalkingSexAgeCategoryRaceDiabeticPhysicalActivityGenHealthSleepTimeAsthmaKidneyDiseaseSkinCancer
319739319785031.930100.00.0016730.0127.0000
319740319786133.201000.00.0006231.0148.0100
319741319787036.540007.00.0013230.0029.0000
319742319788023.380000.00.0006230.0106.0000
319743319789022.220000.00.0002130.0108.0000
319744319790127.411007.00.0116231.0016.0100
319745319791029.841000.00.0013730.0145.0100
319746319792024.240000.00.0004730.0126.0000
319747319793032.810000.00.0002730.00212.0000
319748319794046.560000.00.0008030.0128.0000